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ABSTRACT 

We describe the application of a statistical method to estimate submillimeter galaxy number counts 
from confusion limited observations by the Balloon-borne Large Aperture Submillimeter Telescope 
(BLAST). Our method is based on a maximum likelihood fit to the pixel histogram, sometimes called 
'P(Z3)', an approach which has been used before to probe faint counts, the difference being that here 
we advocate its use even for sources with relatively high signal-to-noise ratios. This method has an 
advantage over standard techniques of source extraction in providing an unbiased estimate of the 
counts from the bright end down to flux densities well below the confusion limit. We specifically 
analyse BLAST observations of a roughly 10 deg^ map centered on the Great Observatories Origins 
Deep Survey South field. We provide estimates of number counts at the three BLAST wavelengths, 
250, 350, and 500 /im; instead of counting sources in fiux bins we estimate the counts at several flux 
density nodes connected with power-laws. We observe a generally very steep slope for the counts 
of about —3.7 at 250 /xm and —4.5 at 350 and 500 /zm, over the range ~0.02-0.5Jy, breaking to a 
shallower slope below about 0.015 Jy at all three wavelengths. We also describe how to estimate the 
uncertainties and correlations in this method so that the results can be used for model-fitting. This 
method should be well-suited for analysis of data from the Herschel satellite. 

Subject headings: Submillimeter Galaxies - Cosmology: observations - Methods: data analysis 
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1. INTRODUCTION 

When the very first surveys are taken in any wave- 
length band, counting the number of sources found as 
a function of source apparent brightness is an efficient 
method for learning about the population of sources un- 
covered. Typically this approach provides clues much 
more rapidly than the painstaking work of identifying 
and studying the sources individually. If the sources 
lie nearby on a cosmic scale, one expects the number 
of sources per unit solid angle brighter than some limit- 
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ing fiux density S to vary as iV(> S) oc S^^^^, the Eu- 
clidean limit. Evolution in the volume density of sources, 
or their luminosity over time, causes a departure from 
the Euclidean slope, such that counts measurements can 
be used to infer information about the history of the 
population. In one of the first quantitative applications 
of this technique, Eddington wrote in The Large Scale 
Structure of the Universe (1911) that the Universe con- 
sists of 10^° or 10^^ stars surrounded by vast amounts of 
empty space. Clearly a large spatial inhomogeneity also 
generates non-Euclidean features, and this was a correct 
inference given that galaxies had not yet been discovered. 
On a cosmic scale we do not anticipate discovering such 
large inhomogeneities, but genuine clustering of sources 
and cosmic variance, where different regions happen to 
have different densities, both of which will effect mea- 
sured source counts. 

At 24 nm source counts follow a Euclidean distri- 
bution until just near the fain t est end of the deep- 
est s urvevs JShupe et all l2008t iPapovich et all 120041: 
Rodi ghiero et al .l l200(il : ICharv" et al .l l2004HMarleaii et al.l 
I2004f l. implying a fairly uniformly distributed stable 
population at the bright end. However, the 850 /zm- 
selected sources in SCUBA surveys f ollow a very 



steep broken power law distribution (see [Coppin et al 
'2006': Knudsen et al.' 2008: Small et al. 2002^: 'Scott et al 



2003: Borvs et al. 2003 : and also 
2009al at mm wavelengths). At 



J002; We bb et al 
lAustcrma nn et al. 
850 /im there is enough of a negative fc— correction that 
there is little variation in the apparent brightness of a 
source with a given luminosity at redshifts 1 < z < 8. 
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Furthermore, the increasing volume sampled at higher 
redshifts enables surveys at this wavelength to efficiently 
sample large numbers of distant objects. The contrast- 
ing shapes of the counts distributions at 24 and 850 /im 
implies both that the brightest sources ar e rare, and that 
their numbers have decreased over time (jPapovich et al.l 
2004t for measurements at intermedi ate wavelengths see 
Fraver et all 120091 : iDole et al.l I2004D . We report here 
surveys made with BLAST, the first statistically use- 
ful surveys in the crucial spectral range from 200 /im 
to 600 /i ni, near the peak of the Cosmic Infrared Back- 
ground (jPugeTeES EMS lEsenirS] lll^ These 
surveys will explore the transition between the nearby 
luminous galaxies and the distant starburst population. 

Counting objects in the sky is an endeavor which is 
probably as old as counting itself. In astronomy deter- 
mining the abundance of objects as a function of appar- 
ent brightness is often the easiest way to describe a pop- 
ulation, since detailed spectral information is usually re- 
quired in order to extract intrinsic properties of objects. 
Hence a great deal has been written about how to esti- 
mate 'number counts' efficiently. The process includes 
carrying out estimates of incompleteness, flux boosting 
and corrections for other sources of bias. 

Ra dio astronomers discovered in the 1950s (jScheueij 
Il957f) that one could use the statistical properties 
of observations of the sky to probe the counts of 
sour ces which are too faint to detect individually (see 
also [M urdoch et al.' ^1973; Scheucr I fzl iCondonI [1971 
iBarcons ,1992; .Takeuchi fc Ishii 200^ The 'probabil- 
ity of deflection' or P{D) distribution is essentially the 
histogram of pixel values in a map, and it depends on 
the underlying source counts. For simple distributions, 
particularly power-law counts, it is relatively easy to es- 
timate the amplitude and slope of the confused source 
counts. The conventional approach has been to count 
brighter objects directly and to carry out a P{D) analy- 
sis at the faint end. However, we have found that, at 
least in regimes where a flux boosting bias is impor- 
tant (ICoDDin et al.l[2?ffl5h . it is better to use a histogram- 
fitting procedure for the full range of source brightnesses. 
In other words if one wants to obtain a robust estimate 
of the source counts, it is better to avoid counting any 
objects at all, a somewhat counterintuitive result. 

This paper specifically examines data from the 
Balloon-borne Large Aperture Submillimeter Telescope 
(BLAST) at 250, 350 and 500 /im. A first estimate of 
th e counts at all three BLAST wavelengths was presented 
in iDevlin et al.l ()2009D . Here we present the method in 
much more detail, and perform a refined analysis includ- 
ing a comprehensive discussion of uncertainties and a 
discussion of clustering. 

Our early attempts to estimate BLAST source counts 
relied on the traditional approach of thresholding the 
maps in signal-to-noise ratio (S/N), extracting candi- 
date sources, and then estimating corrections for 'flux- 
boosting', reliability, incompleteness, etc. This approach 
did not yield useful results, even for S/N>5 sources. 
The BLAST data that we examined consist of a 2-tier 
survey, with a smaller region having much deeper inte- 
gration than the bulk of the map area. In practice we 
find it very difficult to match the counts for the range of 
flux densities where the 2 tiers overlap. Application of 
the method to simulated data-sets reveals strong biases 



in the estimated counts. We are therefore led to pur- 
sue other approaches, motivated additionally by earlier 
attempts to study the P{D) statistics of data from the 
SCUBA instrument, finding that a careful modeling of 
the counts to fit the pixel histogram achieves much more 
satisfactory results. 

We emphasize that in the S/N regime probed with 
BLAST, and future surveys such as those that will be un- 
dertaken by the Herschel satellite, the statistical fitting 
of the pixel histogram gives better results than standard 
techniques of source extraction over the full flux range. 

There have been many p revious P(-D)-style stud- 
ies of source counts (e.g. iFra ncesch ini et al.l [T989I: 
Wall et all Il98^ IBarcons &: Fab ian 19901: lOliver eTaLl 
1997t iMalonev et al.ll2005f) . but~tvpicallv they were re- 
stricted to studying the faint end of the counts, and 
using a single power-law for the underlying model. 
The closest study to our o wn in the literature is by 
iFriedmann fc Bouchetl (|2004D . Those authors developed 
a minimum approach and applied i t to simu lations of 
ISO data from the FIRBACK survey (jPuget et al. 1999) 
to fit a double power-law model. In this paper, we have 
pursued this approach and have developed a maximum 
likelihood method applied to data of significantly higher 
quality and quantity. We have uncovered a number of is- 
sues related to application to real data. We have also de- 
veloped a more efficient implementation of several steps 
in the analysis, as well as techniques to accurately esti- 
mate errors. Discussion of these details is likely to be 
helpful when applying a similar approach to even better 
data-sets. Our method accounts for issues related to re- 
alistic instrumental noise and pre-processing of the map. 
We provide solutions to deal with inhomogeneous and 
large scale noise in the map, and to correct the effect 
of optional map filtering. We also discuss in detail the 
choice of filter to apply to the maps for the optimization 
of source count estimation. 

We present the application of the method to multi- 
power-law count models using Markov Chains (e.g. 
Chib & Greenberg 1995) to sample the likelihood and 
provide an extended discussion of uncertainties and cor- 
relation among parameters. We discuss how to marginal- 
ize over the total background intensity, a quantity which 
is not accessible from the data and examine how to in- 
clude prior information on the background in the analy- 
sis. 

Our paper is organized as follows: we introduce the 
BLAST data in § [2l we present the model of observations 
and the main steps of the derivation of the probability 
function in § [S] The maximum likelihood method is de- 
veloped in § m and in § [5] we present the application to 
BLAST maps and provide estimates of the counts at 250, 
350, and 500 /im. The comparison with other data and 
extensions of the method are discussed in § [6] and [T] 

2. BLAST OBSERVATIONS 

BLAST is a stratospheric balloon-borne telescope in- 
corporating a 1.8-m primary mirror, and operating at 
an altitude of approximately 39 km. The focal plane 
is populated with three bolometer arrays observing in 
contiguous bands with central wavelengths of 250, 350, 
and 500 /im, essentially a prototype of the camera of the 
Spectral and Photometric Imaging Receiver (SPIRE) for 
Herschel (Griffin et al...2007. ). BLAST had two success- 
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ful scientific flights. Here we use data from the 11-day 
flight carried out in 2006 from McMurdo Station, Antarc- 
tica. The under-illuminated BLAST primary produced 
nearly diffraction-limited beams with full-width at half- 
maxima (FWHM) of 36" 42", and 60", at 250, 350, and 
500 fim, respectively. BLAST deep and wide blank-field 
surveys, hereafter BGS-Deep and BGS-Wide, were cen- 
tered on the Great Observatories Origins Deep Survey 
South (GOODS-S) field, which is at the centre of the 
Chandra Deep Field South (CDFS). A second interme- 
diate depth field near the South Ecliptic Pole was also 
surveyed. In a ddition, BLAST also targeted parts of 
the Milky Way (|Netterfield et al.ll2009l ) and some nearby 
galaxies. Several other observations of low-redshift clus- 
ters and high-redshift radio galaxies were made to sample 
biased star-forming regions of the Univer se. "'^'^ Further 
detail s of the instrument can be found in Pascal e et al.l 
), and the flight perf ormance and calibration for the 



2006 flight are provided in lTruch et all (|2009D . In this pa- 
per we focus on the BGS-Deep-|-Wide map, which covers 
an area of approximately lOdeg^. The deep part, nested 
inside the wide, has an area of 0.8 deg^ and is confusion 
limited in all three bands in the sense that the variance 
of the map at the scale of the point spread function is 
dominated by sources rather than noise. 

The processing of BLAST timestreams includes despik- 
ing, correcting time-varying detector responsivities, and 
deconvolving the effects of detector thermal time con- 
stants and audio frequency filtering. The absolute cal- 
ibration is based on regular observations of the evolved 
star VY CMa, which results in systematic uncertainties 
comm on to the three B LAST bands of approximately 
10% (iTruch et all I2OO90 . The cahbration uncertainty 
propagates directly to our counts estimates, and we ne- 
glect it from here on, since it only affects the compari- 
son between our results and those of other experiments. 
Pointing is reconstructed to an accuracy <5". Maps 
are prod uced using SANEPIC, a maximum likelihood 
method (Patanchon et al.] [2008') dealing with low fre- 
quency noise as well as noise correlations between de- 
tectors. Because of the modest scanning angle varia- 
tions of this particular field, residual correlated noise is 
still present at large angular scales after map-making. 
A relatively weak high-pass filter is therefore applied to 
the maps, suppressing signal on scales larger than about 
8'. The filter is anisotropic and stronger in the main 
cross-scan direction. Filtering has little impact on point 
sources and is accounted for in the analysis (see ? 13. 2p . 

3. MODEL OF THE OBSERVATIONS 

In this section, we present the main steps of the com- 
putation of the probability distribution function (PDF) 
with respect to models of submillimeter galaxy num- 
ber counts with typical observational parameters. De- 
tailed derivations and descriptions of the statistics of 
source confusion can be found in several articles (e.g. 
iTakeuchi et al.|[200lL and references therein). Here we 
give only a brief overview of the statistics of the 'P{D)' 
histogram. 



See http: //blastexperiment . info for more details. 

For an introduction to this t opic we recommend the Appendix 
of'Wall et ah (1982) or § 2 of Ta keuchi et ahl 1I2OOII') : see also § 9.1 
of , Trimble fc Ascliwandea I2005i') . 



3.1. The probability of deflection 

Let us define n{S) to be the differential number counts, 
i.e., the derivative of the cumulative source counts: 



n{S) 



dN{>S) 
dS ' 



where N(>S) is the total number of sources per unit solid 
angle with flux densities larger than S. Let us assume 
that we perform an observation at a random position 
ro on the sky. A point source at position r is observed 
with flux X ^ S X /(r — Tq), where /(Ar) is the beam 
function^^. Then, the mean number density of sources 
observed with a flux x is given by the well known result 
(|Condonlll97l : 



R{x) 



d\ 



/(r-ro); /(r-ro)- 



(2) 



Now let mj, be the total number of sources observed with 
a flux between Xk and Xk + Ax, and rfik the expected 
number of sources in the same flux bin. Assuming that 
Ax <C Xfc, we can write: 



nik = R{xk)Ax. 



(3) 



For this paper we assume that the sources are ran- 
domly distributed over the sky without spatial correla- 
tions on scales larger than a beam size. This assumption 
is discussed in § 16. 4p . In observed flux bin k, the prob- 
ability distribution of rrife follows Poisson statistics with 
mean rfik , 



Pk{mk) 



{R{xk)AxY 



ruk 



(4) 



We can also write the characteristic function, i.e., the 
Fourier transform of the probability distribution of the 
Poisson distribution, as follows: 



Pfc(r) =exp[i?(xfe)Aa;(e^^-l)]. 



(5) 



The total flux, ds^ from all the sources in the pixel can 
be written as a sum over all the flux bins, i.e. 



Xk rrik, 



(6) 



k=0 



fe=0 



where Sk is the total flux due to all sources with observed 
flux Xk ~ kAx. We want to obtain the PDF of ds- 
Given that the probability distribution of ds is the result 
of convolution of the probability distributions of Sk , then 
individual characteristic functions multiply. We can thus 
obtain from equations ([5]) and ^ the expression for the 
probability distribution of ds ■ 



p{ds) = T-^ jexp ^ R{x) e™^ dx 



(7) 



Rix)dx } (ds). 



In a pixelised map, /(Ar) is basically the result of the con- 
volution of the experimental beam function with the pixel window 
function. 
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Here we have taken the Hmit of the sum in equation ^ 
as an integral over dx, and w is the variable of the Fourier 
transform of p{ds). 

For most experiments, and explicitly for BLAST, the 
mean value of the flux density is not accessible to obser- 
vation. The measured signal, d, is the sum of the total 
response to the source flux, ds, an offset /i, which we 
assume is constant for all pixels, and instrumental and 
photon noise (which we will hereafter refer to as simply 
'noise'). Assuming the noise is Gaussian with standard 
deviation Un, the probability distribution of the observed 
signal is the convolution of the probability distribution 
of pure signal and of noise, which gives: 



p{d) =T-Ue^p\ / R{x) e*""^ dx 



R(£) dx + i{ix 



(8) 



where ii — — J xR{x)dx for a zero-mean distribution of 
pixel values in the map. In general /i is not known and 
so is a free parameter to estimate (or marginalize over). 
Note that we do not require the instrumental noise to be 
white. The model prediction for the probability distribu- 
tion is valid even in the case of correlated noise, provided 
it is described by Gaussian statistics. In that case the 
variance tr^ in the above equation would then be given 
by the integral of the noise power spectrum. 

Figure [T] shows the predicted PDF of noiseless obser- 
vations with the 250 /im BLAST beam for two differ- 
ent galaxy numbe r count models: a typical 2 power-law 
model at 250 ^m (jBorvs et al.ll2003l) and a single power- 
law model. The curves illustrate how sources with dif- 
ferent fluxes contribute to the PDF. There are 3 features 
to note: 

1. The very faint sources, for which the average num- 
ber is much larger than one source per beam, in- 
duce an almost Gaussian behavior, as the Poisson 
distribution tends to Gaussian for large numbers. 

2. Sources with fluxes for which there is about one 
source per beam have the greatest impact on the 
proflle of the distribution (the width in particular). 
They contribute signiflcantly to much higher fluxes 
in the histogram than their own flux because a large 
fraction of them are superimposed on each other. 

3. The bright sources contribute mainly to the pos- 
itive tail of the distribution, but also to the full 
range of pixel brightnesses, since sources can con- 
tribute everywhere in the beam. At the bright end, 
the probability distribution varies almost propor- 
tionally to the source counts for power-law type 
counts. This can be easily checked from equa- 
tion ([7]) - if only bright sources are present then 
exp[/ R{x) e™^dx] 1 + / R{x) e™=^da;, and so 
p{d) (X R{d) which is also oc n(d) {d ^ 0) for a 
power-law (see below). 

It is interesting to consider the case of a single power- 
law number counts model consisting of 



Here the number of sources per 'observed' flux x (equa- 
tion ([2])) is also a power-law: 

R{x) = fli.Kx-", (10) 
where J7b is the effective beam solid angle, defined as 



(11) 



(9) 



One can see that the Fourier transform of the probability 
distribution is anal ytic in this ca se. The formalism is de- 
scribed in detail in Con don|[l974 The resulting distribu- 
tion is an alpha-stable function (see[Herranz et al. 2004). 
This expression may be a good approximation when the 
number counts only mildly deviate from a power-law. 
However, in general, we need an approach that works for 
a much wider range of counts models. 

3.2. Dealing with map filtering 

The effect of a spatial filter being applied to the maps 
can be modeled entirely by introducing an effective beam 
which results from the convolution of the actual beam 
with the filter kernel. In most applications, a high-pass 
filter is applied to the maps in order to suppress any 
residual large scale noise, and this has been done for 
BLAST, see § El In this case, the kernel is a 'Mexican 
Hat' shape which suppresses the average value and low 
spatial frequencies, and the resulting effective beam takes 
both positive and negative values. Formally, the signal at 
a given pixel in the filtered map may be decomposed as 
the difference of measurements from two distinct regions 
in the sky: d = d+ — d-, with d+ being the result of 
the integral from the positive part of the effective beam 
only and c?_ from the negative part only (for which the 
absolute value is taken). 

The probability distribution of the measurements (pix- 
els) in the map is then the result of the convolution be- 
tween the two individual probability distributions: 

p{d)=p{d+)*p{-d^), (12) 

which gives, using equation ([8]): 

+ J R- (x) e-™^da; - J (i?+ (x) + i?_ {x))dx (13) 

Here i?+ and i?_ are computed from relation ([2]) for 
the absolute values of the positive and negative parts 
of the beam, respectively. Consequently, the probability 
distribution contains a negative tail, which can be un- 
derstood by realising that after high-pass filtering very 
bright sources induce negative shadows around their lo- 
cations in the map. However, this effect is barely seen in 
BLAST histograms, since the high-pass filter applied to 
the maps is relatively weak. 

Figure E] shows a predicted PDF for noiseless obser- 
vations with BLAST at 500 /xm, with and without high- 
pass filtering applied to the maps, as we have done in the 
analysis (see § [2l the filtering applied is weaker for the 
other two wavelengths). The effect of filtering is fully 
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Source flux density S [Jy] Measured flux density D (Jy/beam) 



Fig. 1. — The left panel shows two examples of differential number count models at 250 fim (solid curves) and the corresponding integrated 
number counts N{>S) (dot-dashed curves). The first model is a simple power-law of the counts dN/dS = NoS~'^'^, and the second model 
is a double power-law dN/dS oc [(S/So)^'^ + {S/ So)^'^]~^ with the transition flux So = 30mjy. The dotted line corresponds to one source 
per 250 /im BLAST beam area. The right panel shows for the different models the PDFs of pixel values for noiseless observations with the 
250 fim BLAST beam. The solid line corresponds to the simple power-law model, the dot-dashed line to the double power-law model, and 
the dashed line to the same single power-law model except that for sources between 0.2 and 0.4 Jy the differential counts have been set to 
zero (this is of course a very unrealistic model but the curve is informative nonetheless). One can see that at high fluxes the PDFs behave 
like the differential counts power-laws (see text). Note also that a very sharp transition in the counts appears a lot smoother in the PDF, 
even at relatively high flux densities. Even if no source with 0.3 Jy flux density is present, a signiflcant number of 0.3 Jybeam"^ pixels is 
expected, due to the probability of landing near bright sources, as well as the probability of havin g sm aller flux galaxies on top of each 
other. This also explains the correlations in the estimated number counts parameters described in § 15.31 
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Fig. 2. — The solid curve represents the predicted PDF (inte- 
gral normalized to unity) of noiseless observations for B LAST at 
500 /.im derived from our best flt model described in l5.2l and using 
our best estimate of the beam. The dot-dashed curve represents 
the predicted PDF taking into account the effect of high-pass fil- 
tering the map, as done in the analysis. The two curves are nearly 
identical for a; > 0. The dashed line represents the absolute value 
of the difference between the two PDFs which is about 5% for posi- 
tive X. Filtering induces a small negative tail to the histogram and 
affects the high flux part. 

taken into account in the analysis of all BLAST maps 
presented in this paper. 

In developing the techniques described in this section 
we have in mind experiments like BLAST which are 'total 
power' (even although the 'DC level' in a map is unmea- 
surable). However, the same formalism can be used to 
deal with maps resulting from experiments performing 
differential measurements, using a chop for example. In 
that case the effective beam can be described as a nega- 
tive side shifted by some angular distance from an iden- 
tical positive side. The histogram for a map made with 
a single difference 'double-beam' pattern will be (statis- 
tically) symmetric between positive and negative fluxes, 



and in practice this may lead to further complications in 
inverting the PDF. 

3.3. Number counts parametrization 

We have previously described how to obtain the PDF 
of pixel values from a given galaxy differential number 
counts function assuming that galaxy locations are un- 
correlated. For the analysis of BLAST maps (described 
in the next section) we need to do the opposite, i.e., the 
objective is to obtain the best estimate of the counts 
starting from the determination of the histogram of pixel 
values, which is a measure of the PDF. Because the rela- 
tionship between the counts at a given flux and the PDF 
at the same flux is far from straightforward (this is il- 
lustrated in Figure [1] which shows how the PDF changes 
after removing the contribution of galaxies whose fluxes 
are within a given flux range), degeneracies are expected 
between parameters. It is therefore necessary to model 
the counts with a very limited number of free parame- 
ters, as we would do in a deconvolution problem, and to 
consider the correlations between these parameters. 

We tried different phenomenological models, like sin- 
gle or double power-laws with a break. We finally chose 
to parametrize the differential number counts by a set of 
amplitudes at a few predefined fluxes, with the intervals 
between flux nodes interpolated with power-laws to im- 
pose continuity of the counts. The number and location 
of the nodes are chosen so that the errors on param- 
eters are neither too small (which would suggest that 
more nodes could be added) nor too large, while mak- 
ing sure that the quality of the overall fit is satisfactory 
and does not significantly improve with the addition of 
more nodes. For the BGS maps we found that no more 
than about 6 amplitude parameters per waveband can 
reasonably be estimated. 

4. PARAMETER ESTIMATION METHOD 

The parameter estimation method developed in 
this section is similar to the approach described by 
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iFriedmann fc Bouchetl (|2004f) . It is based on minimiz- 
ing the mismatch between the predicted and measured 
PDFs, i.e. histograms of the maps, in order to estimate 
the number count parameters and also the noise parame- 
ters. The approach is based on maximizing the likelihood 
of the data given the model. We denote by 9 the param- 
eters of the number counts model, which can take any 
form. Later we will focus on a specific model using a 
handful of nodes joined with power-laws. 

4.1. Likelihood of the data 

Let us assume that the different measurements in dif- 
ferent pixels of the map are independent. We will discuss 
this assumption in § 14.21 The likelihood of the data is 
the product of the probabilities of the individual mea- 
surements: 

L{d\9)^Ukp{dk\9), (14) 

where d = {di, ...dfc, ...} groups all the measured flux 
values in the different pixels, and p{dk\9) represents the 
PDF of individual measurements given by equation ([8]) . 
However, instead of equation p4|) . it will be easier to 
consider the log-likelihood: 

\ogL{9)^J2^og{pid,\9)). (15) 
fc 

Assuming that the noise is stationary, and that the prob- 
ability distribution does not vary much over a pixel flux 
density bin, then equation (jlSp becomes 

log L{9) ~ ^ Hi log{pi{9)) + constant, (16) 

i 

where rii is the number of pixels with flux densities in 
the ith flux bin interval, e.g. it is the histogram of the 
data, and pi is the result of the integral of the PDF in 
the ith bin, normalized such that J^iPi — 1- § 15.21 
where we show how non-stationarity can be taken into 
account. Let us define the quantity $ as the negative of 
the log-likelihood: 

$(0) = -^naog(K(0))-log(iV!) + ^log(n,!), (17) 

i i 

where N is the total number of measurements (i.e., pix- 
els). Now $ is equivalent to the negative logarithm of 
a multinomial distribution function (the last two terms 
in the equation being derived from the normalisation of 
the distribution) and is the quantity we will minimize in 
order to estimate the number counts. We notice the fol- 
lowing properties: (rii) = Npf, Var{ni} = Npiil — Pi); 
Cov{ni, rij} = —NpiPj (i ^ j). If the conditions 3> 1 
and Pi <C 1 are satisfied for all bins then the quantity $ 
becomes quadratic in n^: 




where the normalization constant K = {N/2) log(27r) + 
{l/2)'^-log{Npi). Note this quantity is simply a mea- 
sure of the mismatch between the probability distribu- 
tion function and the histogram of the data^^. 

IFriedmann fc Bouchct (2004) use a similar quadratic statistic 
(without the normalization term K) to fit for source count param- 



In order to estimate the parameters of the model, we 
minimize equation p?)) using a simple Markov Chain 
Monte Carlo Metropolis Hastings method (MCMCMH), 
which allo ws us to sample the hkel ihood around its max- 
imum (e.g. IChib fc rTree]iiberg|[T995f ). 

4.2. Approximate likelihood and error covariance 
prediction 

One of the assumptions made for the derivation of the 
likelihood in Eq. [14] is that observations, i.e., pixel val- 
ues in the map, are independent of each other. This is 
obviously not actually the case, because the beam corre- 
lates the signal in adjacent pixels for well sampled maps. 
So does any filtering applied to the maps and residual 
large scale noise. However, neglecting this effect does 
not introduce a bias in parameter estimation, because 
correlations in pixels only correlate measurements of the 
histogram and do not modify its expected shape. Nev- 
ertheless, neglecting the beam correlations reduces the 
performance of the method; another way to think of this 
is that sources will cause many neighboring pixels to be 
bright, but the information about the spatial distribution 
of bright pixels is lost if one only uses the pixel histogram. 
As we will describe below, we deal with this by smooth- 
ing our maps with the beam (see ? l5.ip and then using an 
estimate of the effective number of independent pixels. 

Measurements of the curvature of the log-likclihood, 
derived under the assumption of the incorrect correla- 
tions, will lead to an underestimate of parameter errors. 
This can be corrected, to a first approximation, by tak- 
ing for ni in the negative-log-likelihood expression (equa- 
tion (|17p ) the effective number of independent measure- 
ments in the map in flux bin i, which is approximately the 
number of pixels in flux bin i divided by the beam solid 
angle, measured in pixels^^. Also, N should be taken to 
be the total map area divided by the beam solid angle. 
This corresponds to applying to equation (frf]) a factor 
which is the inverse of the beam area in pixels. We have 
applied these corrections for the estimation of uncertain- 
tics in the next section. 

The error variance and covariance of parameters are 
obtained from sampling of the approximate likelihood 
with MCMCMH (see § We have also made a set 

of 60 Monte Carlo simulations of 250 /xm maps follow- 
ing the full processing procedure. This has allowed us to 
check the validity of the error variance prediction using 
our simple likelihood correction. We flnd that differences 
between our error estimations and the Monte Carlo sim- 
ulations are less that 40%, but this should be quantified 
more accurately with larger sets of Monte Carlo simula- 
tions. Although this has not been checked explicitly, we 
expect the likelihood approximation to also be valid for 
350 and 500 /im data. The likelihood contours between 

eters, but we prefer to use the actual log-likelihood of the data 
derived in equation l|17| l. which gives proper weights to the tails of 
the histogram and allows us to define very fine flux density binning. 

As described in ^ 15.11 each map has been filtered by the beam 
for better estimation of parameters. This means that the effec- 
tive correlation length is the beam width for the noise component 
and slightly more for the signal component (larger by \/2). We 
have chosen to use the beam width as the factor, which should 
lead to a mild underestimation of the errors (by a factor smaller 
than v^, but this has not been fully quantified with Monte Carlo 
simulations) . 
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pairs of parameters are also computed from the output 
of MCMCMH. 

Since we calculate the maximum likelihood estimate 
of the model, and assuming that the model holds for 
some value Oq of the parameters, it is also possible to 
obtain an estimate of the errors by computing the second 
derivatives of the likelihood. The asymptotic covariance 
matrix of the estimates is given by 

Cov(^) = {{{§ - eo)i9 - 0of}) ~ J{eo)-\ (19) 

where arc the estimated parameters and J{9q) is the 

Fisher information matrix, which is, following equa- 
tion ini), 

W'«n.^Nj:^l^l. (20, 

i 

In practice, the Fisher matrix is evaluated at the point 
of convergence 9. Again, N here is the number of beam 
areas in the map. 

4.3. Joint fit of maps with different depths 

In order to derive equation (|17p . which gives the quan- 
tity which must be minimized, we have made the assump- 
tion that the probability distribution function is the same 
for all observations (or pixels). In practice for BLAST 
this is not valid because the noise variance changes across 
the map. One could always minimize the full expres- 
sion in equation (|15p , but that would be extremely time- 
consuming, since one evaluation of the PDF would be 
required per observation. On the other hand, neglecting 
the non-stationarity of the noise would be sub-optimal 
since all pixels would have the same weight indepen- 
dently of their noise rms. Ignoring unequal weights would 
also bias the parameter estimates, since non-stationarity 
leads to a non-Gaussian histogram of the noise part, even 
for Gaussian noise. The solution we have adopted is to 
divide the observed maps into a limited number of zones 
such that in each zone the noise variance is approximately 
constant. We then compute histograms and the quantity 
in equation (|17p for each of the zones. The resulting cri- 
terion to minimize is then 

q 

with (j)q computed from equation (|17p for the noise vari- 
ance in zone q. 

4.4. Goodness of fit 

The quality of a particular parameter fit can be mea- 
sured using Eq. ([T5|) after rebinning the data such that 
rii ^ 1 while satisfying pi <^ 1. If the model holds we 
have the following relations: 

{<i>)o,^{N-ne)+K, (22) 

where ng is the total number of parameters we estimate, 
and K is the normalization term in (jlSp : and 

Var($) ~ (iV-ne). (23) 

These relations can be used to test the compatibility of 
any given model with the data. 



5. APPLICATION TO BLAST NUMBER COUNTS 

We now discuss the estimation of BLAST number 
counts using the method described in the previous sec- 
tion. First, we discuss the data and specific processing 
steps carried out on the maps prior to performing the 
likelihood analysis. 

5.1. Data preparation for P(D) analysis 

For source extraction from noisy data it is well known 
that one should use a matched filter on a well-sampled 
map. It has been shown that extraction of submilli- 
meter galaxies at low S/N levels can be performed ef- 
ficiently by using one of several variants of this tech- 
niqu e - either thresholding on a beam-correlated map 
(e.g. lEales et al. 1999: Borys et _al. 2003), or in the case 
of variable backgr ound, by using a Wiener filter (e.g. 
iPerera et all [2001 or its approximation via a M exican- 
hat (e.g. lBarnard et al.|[20Cil iChapin et al.|[2008l) . This 
works because beam-fitting is mathematically identical 
to finding maxima in beam-correlated maps, provided 
that the noise is white and the sources are unresolved. 
The Wiener filter simply suppresses any additional large- 
scale correlated noise, corresponding to a negative ring 
in the spatial filter. 

For P{D) analysis, the problem is very similar. In the 
low S/N regime (as in BGS-Wide), the histogram of pixel 
values will be dominated by noise, providing that pixels 
are small enough to fully sample the beam, which corre- 
sponds to 10" pixels for BLAST. The histogram depends 
on the chosen pixel size, which is obviously not satisfac- 
tory because then one is led to either choose between 
or combine results obtained using different pixel sizes in 
order to learn about structure in the map. Essentially, 
in a noisy fully sampled map there is information about 
faint structure which is encoded as a correlation between 
adjacent pixel values and this information is ignored in 
P{D) analysis. However, after filtering a well-sampled 
map with the beam kernel, the noise contribution to the 
width of the histogram is reduced by a factor which is 
approximately the number of pixels per FWHM of the 
beam^^, whereas the signal contribution is only reduced 
by approximately at least for a Gaussian beam. It 
then seems clear that in the noise-dominated regime, it 
is better to cross-correlate the maps with the beam ker- 
nel before P{D) analysis. Even though the map itself 
is likely more confused after this process, the resulting 
histogram effectively contains information about sources 
contributing over roughly a beam area of neighboring 
pixels. We have verified that this improvement is realised 
in practice, the errors on number count parameters be- 
ing reduced by a factor of about three at 250 /im for the 
bright end fluxes, and by a larger factor at longer wave- 
lengths and/or for fainter fluxes (after beam convolution 
of 10"pixel maps). 

In the opposite regime of source confusion dominat- 
ing over detector and sampling noise, it may be a bet- 
ter approach to partially deconvolve the map. In the 
extreme limit of a noise free map, one would simply de- 
convolve to obtain (5-functions and count the point source 
strengths. Probably the deconvolution criterion should 

Recall that the area of a Gaussian beam 2TTa^ ~ 
l.UFWHM^. 
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be that confusion noise becomes of order the instrumen- 
tal noise. However we have not carried out the extensive 
simulations which would be required to test this hypoth- 
esis. A relatively good choice might be to apply a Wiener 
filter to the confused the map. Nevertheless, there would 
appear to be no generally applicable solution, since one 
filter might be optimal for the low flux sources, whereas 
some different filter might be better for high flux sources. 
In practice these filters will depend on the counts them- 
selves (perhaps related to 51b from equation ([Tf] for ex- 
ample), so ultimately one might be forced to iterate to 
find the best solution. 

We have chosen to filter the BGS-Deep-|-Wide maps 
simply with the beam kernels at the appropriate wave- 
lengths, even if these are not strictly speaking the opti- 
mal filter for each map. We reiterate that this choice will 
not bias our results, it just means that our error bars are 
not optimal. 

Figure [3] shows the histogram derived from the BGS- 
Deep and Wide map at 250 fim using 10" pixels, before 
and after filtering with the beam kernel. The units of the 
two maps are Janskys per beam, such that if there was a 
source at a given location in the maps, a measure of its 
flux in Janskys could be read straight from the pixel at 
that location. The expected noise histograms, given the 
noise variance in each map, are also shown (also before 
and after filtering). From comparing these curves, we 
can anticipate a large improvement for P{D) analysis in 
using the filtered map with the beam kernel, as compared 
to using the raw 10" pixel map (which is equivalent to 
filtering the map with a 10" square filter), because the 
width of the histogram is significantly reduced. The his- 
togram is dominated by noise in the unfiltered 10" pixel 
map, and by confusion in the filtered map. For BGS- 
Wide the beam kernel is clearly a very good filter, since 
the map is dominated by noise even after filtering. 

Because of the large range of noise variance across the 
BGS field, we have divided the maps into 8 zones such 
that in each zone the noise variance can be assumed to be 
constant, following the approach described in 14.31 The 
boundaries were chosen so that variation of the variance 
in each zone has an rms of about ±10%. Thus, the fluc- 
tuations do not significantly modify the Gaussian shape 
of the expected noise histograms. In the end, most of 
the constraint on the number count parameters comes 
from two zones only: the deepest, containing ~ 5% of 
the total number of observed pixels; and a second zone, 
which covers most of BGS- Wide and contains ~ 80% of 
the pixels. Figure S] shows the variance map at 250 /zm 
with our eight regions superimposed. We can clearly see 
the two main variance zones. The other selected zones 
are, for the most part, located at the transition between 
Deep and Wide. We could have excluded them from our 
analysis with little effect on the parameter constraints, 
but there is no reason not to include this additional in- 
formation, provided these regions are treated correctly. 

The variance in the noise per pixel is computed in the 
map-making procedure by propagating the information 
fro m estimates of the tim estream noise power spectra 
(see lPatanchon et al.ll2008l for details). Some approxima- 
tions are made in the calculation, in particular, we im- 
plicitly assume that the residual noise in the final map is 
white. However, we know that map-making is not 100% 
successful in removing large scale noise in these data, 



partly due to the low cross-linking angle of the scans. 
Therefore, even after high pass filtering the map, a small 
fraction of correlated noise is expected to remain. It is 
important to notice that residual large scale noise simply 
increases the width of the expected noise histogram, but 
does not change its Gaussian shape, assuming the input 
noise was Gaussian. In order to minimize potential bi- 
ases in the number count parameters which might arise 
from underestimating the noise variance in the map, we 
choose to estimate the noise variance in every zone ex- 
cept the deepest one as additional free parameters. We 
justify the exception of the deepest zone below. This 
approach should be conservative, and in the end it has 
a very low impact on parameter estimation errors be- 
cause the faint end counts, which are mostly degenerate 
with noise in the noisier regions, are mainly measured in 
the deepest region. Consequently, noise variances in the 
other zones are very well determined through the P{D) 
analysis and fixing those would not bring significant ad- 
ditional constraints on the counts. At the faint end of 
the counts, since confusion dominates over measurement 
noise in the deepest region, errors of 10% in the noise 
variance (which is larger than we expect) would bias the 
estimated counts by only a fraction of la. We can con- 
fidently fix the noise variance parameter for the deep 
part to the predicted value without risk of biasing count 
parameters. We have found that the noise variance mea- 
sured by P{D) in the wide zone is about 5% larger than 
the variance measured in the pre-processing. Differences 
are larger in the zones located at the transition between 
BGS-Deep and BGS- Wide, but this is expected consid- 
ering the potential sources of additional systematics in 
regions coincident with scanning accelerations. 

We have verified the statistics of noise using jack-knife 
maps which were made by computing the difference be- 
tween two independent maps from the same region ob- 
served at different period during the flight. We did not 
find any evidence for departures from Gaussian instru- 
mental noise when examining the histogram of the dif- 
ference map. 

To build the histograms of data which are used for 
the fit, we use a very fine binning in flux density 
(< 1 mJy beam~^) such that variations of the PDF within 
bins is completely negligible. For such small bins the re- 
sults are independent of the bin size. As a result, in 
practice a large fraction of high flux bins receive either 
or 1 hit. 

5.2. Estimated number counts 

P{D) analysis is carried out by minimizing the 
negative-log- likelihood in equation (|17p . We first at- 
tempt to fit a single power-law model for the number 
counts: dN/dS = NqX (S/Sq)^. Best fit amplitudes and 
power-law indices for each of BLAST'S three wavelengths 
are given in Table [1] 

We find that the best fit single power-laws are very 
steep, the index /3 being —3.0 at 250 /im and —3.1 at 
350 and 500 fim. The strong departure from Euclidean 
number counts (/3 = —2.5) is an indication of strong 
evolution in the su bmill imeter galaxy popula tion. See 
iPascale et all (|2009[ ) and lMarsden et all (|2009f) for quan- 
titative measurements of these effects made via stacking. 
The steepening of the counts with wavelength suggests 
a significant contribution from high redshift galaxies to 
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Fig. 3. — Histograms of the BGS-Deep map (left panel) and BGS-Wide map (right panel) at 250 ^m, before and after cross-correlation 
with the beam kernel (dot-dashed and solid curves, respectively). The original map is computed with 10" pixels, which is slightly less than 
one third of the pixel size, and the units are Jansky per resolution element in each map. Dotted curves show the expected histograms for 
noise only (both filtered and unfiltered). The comparison shows that filtering the maps with the beam is a much better choice than using 
the raw 10" pixel map (even in the deep part where the signal to noise is larger), in the sense that the difference between the measured 
histogram and the noise variance is enhanced. This is because signal from sources is less affected by filtering than noise, which has power 
at higher frequency (see text). 



TABLE 1 
Best fit power-law number counts 





250 Atm 


350 /xm 


500 /im 




-3.005 ±0.022 


-3.119 ±0.024 


-3.101 ± 0.024 


So (mjy) 


7.5 


2.2 


1.8 


logATo [deg-2 Jy-i] 


6.036 ± 0.009 


7.383 ± 0.012 


7.269 ± 0.015 


•Smin (mJy) 


0.1 


0.05 


0.025 


Smax (mJy) 


1500 


550 


250 



Note. — Fitted models of differential number counts: dN/dS = Nq X 
(S/So)^ for the three wavelength. The log of the amplitude A^o and the 
index /3 are both estimated. The reference fiux density So is chosen for each 
wavelength such that errors on the two parameters are nearly decorrelated. 
We have limited the differential counts between S^nin ^md Smaxi which are 
given in the last 2 rows. The high flux limit Smax is set to be slightly larger 
than the brightest source observed at each wavelength. The value of 5niin 
is somewhat arbitrary; we selected a value which is low enough so that if 
we divide it be a factor of 2 the count slope does not change significantly. 

in the model, due to the steep slope of the counts. 

The need for a break in the slope towards fainter fluxes, 
so as not to overproduce the total cosmic infrared back- 
ground, shows that the single power-law model is unre- 
alistic. Moreover, it appears that such a simple model is 
not a very good fit to the data, especially at 250 /im, as 
we will discuss later. Measurement of a break to a flat 
slope at faint fluxes would indicate that the BLAST maps 
are nearly deep enough to capture the full intensity of the 
CIB, and would presage definitive results from Hershel. 
Since one would like to be able to constrain the num- 
ber counts over different intervals of source flux one is 
led naturally to consider more realistic models. We have 
chosen to fit power-laws for differential number counts 
within predefined flux density bins, as described in § 13.31 
A number of 6 distinct power-laws are estimated (a total 
of 7 free parameters) for the differential number counts 
at 250 /im, and 5 power-laws (6 parameters) at each of 
350 and 500 /im. The last and first power laws are tied 
to the first and last nodes, respectively. Number counts 
are set to zero below the first, and above the last nodes 
(they provide limits for the computation of the PDFs). 
The choice of flux density for these extreme nodes is set 
by requiring them to be very far from typical values con- 
strained by BLAST (e.g., the extreme nodes are set to 




0.0005 



Fig. 4. — Variance map at 250 fim. Contours delimit the different 
variance zones, except the higher variance one which concern a 
small fraction of pixel spread over the top of the map. 



the confused signal in the maps. However, these results 
are somewhat sensitive to the faint end flux cut imposed 
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Fig. 5. — Best fit differential number counts for the three BLAST 
wavelengths. Quoted error bars are derived from the marginal 
distribution for each parameter; because of non-Gaussian behavior 
of the likelihood around its maximum, the best fit model is not 
necessarily centered on the errors bars. The first and last power 
laws (at the faint and bright ends, shown by dotted lines) are 3o" 
upper limits (corresponding to 99.9% confidence regions for a 1- 
tailed Gaussian). There appears to be a shallower slope at both 
ends. 




IQ-' I ^ , 

0.01 0.10 1.00 

Flux [Jy] 



Fig. 6. — Best fit differential number counts inc ludin g FIRAS 
background priors. Compare with Figure|5]and see i) l5.4l for details 



ID""* Jy and 10 Jy at 250 /xm), such that results are 
independent of our particular choice. Best fit number 
counts for the three wavelengths are displayed in Fig- 
ure O The errors bars shown in the figure are computed 
from the 68% confidence intervals on the marginalized 
distributions of each parameter separately, the marginal 
distributions being estimated by sampling the likelihood 
with MCMCMH. Median values of each parameter de- 
rived from marginal distributions and 68% confidence in- 
tervals are given in Table [H These are not exactly the 
parameters of the best fit model, due to non-Gaussian 
likelihoods around the maximum. Pearson correlation 
matrices for the parameters are given in Tables HHB] 

There appears to be a change towards shallower slopes 
at the faint ends of the counts at all three wavelengths, 
and also at the bright end at 250 /xm. This bright end 
behaviour is consistent with the expectation that we are 
entering the Euclidean regime for Jy sources. At all 3 



wavelengths the counts are much steeper than Euclidean 
over most of the range of flux densities probed. The slope 
of the counts is close to —3.7 at 250 /itm over the flux den- 
sity range range 0.02-0.5 Jy, and —4.5 for the two longer 
BLAST wavelengths in approximately the same range. 
This is not compatible with the slope values fitted assum- 
ing single power laws. The break in the slope observed 
at the faint end is consistent with the requirement that 
the background not be overproduced. Since the intensity 
made by the counts is / = / S.{dN/dS)dS then we must 
have f3> — 2asS'^0in order to have convergence. 
This is not imposed in our method since the mean value 
of the probability distribution is set to 0, so we could have 
found un-physical solutions of /3< — 2. Figure [6] shows 
that there is some improvement in the faint end slope 
constraint, shown with a dotted line, when we include 
a prior on the total background intensity. In addition, 
as we discuss in § 15.41 imposing this prior also reduces 
parameter degeneracies. 

The gain in the quality of fit going from the single to 
multi-power-law model can be evaluated by measuring 
A^, which is the difference of the minimized quantity 
for the two models. Since the single power-law model is 
contained within the set of multi-power-law models, we 
expect A$/2 « 1 for each degree of freedom removed. 
We found that A$/2 = 15.5, 8.7, and 8 at 250, 350, and 
500 /xm, respectively, while the difference in the number 
of fit parameters is 5 at 250, and 4 at 350 and 500. The 
multi-power-law model is therefore a significantly better 
representation of the data. We have also checked that 
adding more number count parameters (by dividing the 
counts into more flux nodes) does not significantly im- 
prove the fit. We thus conclude that the BGS data can 
constrain ^ 6 parameters. 

We compare the predicted histograms (i.e., rescaled 
PDFs) of the best fit multi-power-law model, with the 
actual histograms of the maps in Figure \7\ We sepa- 
rately plot the histograms of the deep and wide zones 
and ignore the others here, which give only weak addi- 
tional constraints on the parameters. One can see that 
there is a very good match of the model to the data, 
considering that pixels of the maps are correlated on the 
beam scale, i.e., for about 3.5, 4.5, and 6 pixels of 10" 
size at 250, 350, 500 ^m, respectively. The apparent dis- 
crepancy in the histograms of the deep section of the 
map at fluxes around 0.1 Jybeam"^ at 250 /im (around 
0.05 Jy beam" at 500 fim) is in fact not very significant, 
because correlations of pixels in the map induce corre- 
lations in the histogram bins. However, some of the 
difference at the bright end of the deep counts may be 
real. On average, a significant fraction of the pixel flux 
at around 0.2Jybeam~^ (say, at 250 /im) comes from 
very bright sources which are observed at the edge of 
the beam (250 ^m b eam also has significant side lobes, 
see iTruch et al.l ()200 9) ; they are taken into account in 
this analysis), and it appears that the measured density 
of such sources in the deep region is a bit lower than 
the density observed in the wide region. This explains 
why at the bright end, the measured histograms of the 
deep sections are systematically lower than the predic- 
tion from the best fit model, even at moderate fiuxes. 
This is partly explained through the historical choice of 
this region (initially the Chandra Deep Field South) to be 
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TABLE 2 

Best fit differential number counts 



250 ^tm 

Node Best fit Marginal 
(Jy) log[deg-2 Jy-i] 



350 /xm 

Node Best fit Marginal 
(Jy) log[deg-2 Jy-i] 



Node 

(Jy) 



500 ^lm 
Best fit Marginal 
log[deg-2 Jy-i] 



1.0 X 10- 
0.02 

0.1 
0.25 

0.5 

1.2 

10 



3.64 
5.65 
2.45 
1.49 
-0.13 
-0.27 
-11.77 



< 10.28 (3(t) 

c: t;o+0.07 
t>-O8_0.11 

2 51+0-17 
1 41+0.20 

-0 10+0-43 
"■^'-'-0.74, 
fl 07+0.38 

"•■-"-o.se 
3.23 (3o-) 



5.0 X 10' 
0.015 
0.05 
0.15 
0.5 
5 



5.65 
5.75 
3.17 
1.29 
-0.29 
-6.80 



< 11.12 (3(t) 

c OO+0.14 

0-»8_Q jg 

o OO+0.39 
^■^^-0.57 
^^26+0.32 



< 



-0.23; 
< -2.19 (3cr) 



-0.43 
1-0.36 
-0.45 



2.5 X 10-^ 
0.01 
0.03 
0.08 
0.25 
2.5 



7.47 
5.85 
3.81 
1.85 
-0.06 
-12.56 



< 11.03 (3o-) 
5 Q7+0.17 
''•^'-0.21 

q fiQ+0.36 
'-'•"•-'-0.58 
1 90+0-40 
_n 0^-1-0.53 

< -2.27 (3cr) 



Note. — Differential number counts are parametrized by the log of the amplitude at fixed fiux density nodes, and 
filled with connected power-laws. Best fit parameters are given for the three wavelengths, as well as median values of 
the marginal probability distribution for each parameter. Quoted uncertainties are 68% confidence intervals (except 
for the first parameters for which we give 3(t upper limits, corresponding to 99.9% confidence), derived from the 
marginal distribution for each parameter. Errors on parameters are highly correlated (see Tables |4}{6] for Pearson 
correlation coefficients). 



TABLE 3 

Best fit differential number counts with background constraint 



Node 

(Jy) 



250 ^tm 
Best fit Marginal 
log[deg-2 Jy-l] 



Node 

(Jy) 



350 i^m 
Best fit Marginal 
log[deg-2 Jy-l] 



500 ^lm 

Node Best fit Marginal 
(Jy) log[deg-2 Jy-l] 



1.0 X 10" 
0.02 

0.1 
0.25 

0.5 
1.2 

10 



8.44 
5.35 
2.73 
1.32 
-0.02 
-0.30 
-21.20 



< 9.28 {3a) 
5 4Q+0-13 

+0.16 
^-"^'-0.19 
, -1-0.20 
-0.24 
,-f0.49 
-1.01 
-,-1-0.37 
-0.39 



i.4i: 
-0.22;! 

-0.22^ 



5.0 X 10-5 
0.015 
0.05 
0.15 
0.5 



8.32 
5.67 
3.26 
1.19 
-0.12 



< 9.42 {3a) 

5+0.13 
'-0.17 
hO.39 
-0.57 



5.i 

2.85^ 



1.28 
-0.24 



+0.32 
0.49 
fO.38 
-0.46 



< -3.33 {3a) 



-10.90 <-2.46(3o-) 



2.5 X 10" 
0.01 
0.03 
0.08 
0.25 
2.5 



8.66 
5.72 
3.94 
1.77 
0.00 



< 9.67 {3a) 
5.95 



3.62 
1.87 

-0.32' 



+0.18 
0.20 
+0.44 
0.66 
+0.52 
-1.08 
0.61 
0.81 



-16.01 < -1.75(3(7) 



Note. — Same as Table [2] but using background constraints coming from using the EZRAS measurement as 
a prior in the P{D) analysis. Some of the error bars increase a little after adding the EZRAS constraint. This is 
because some parameters are slightly lowered when we add the prior and the corresponding amplitudes have a larger 
probability of being very close to zero. 
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ElG. 7. — ZZistogram of pixel values for the deep and wide zones (which correspond closely with BGS-Deep and BGS-Wide) compared 
with prediction from the best fit model of the differential counts (dot-dashed line for deep and solid line for wide). The bins in flux density 
are chosen to be different for the deep and wide histograms for clarity, and are both much larger that the binning used for parameter 
fitting. This figure shows the very good fit of the model to the data at all three wavelengths. The apparent discrepancy at the bright end 
for the histograms of the deep part is not very significant, because of the large correlations of values between bins in the histograms. We 
can see that the deep region, which is confusion limited (the contribution of the noise to the histogram is smaller than the source confusion 
contribution), contains almost all the information on faint sources, which are within the noise regime of the wide histogram. At the bright 
end, the Wide part carries almost all the information about sources above 0.2 Jy at 250 ^m, and 0.1 Jy at 350 and 500 /^m, showing the 
importance of sky coverage. 



devoid of bright sources (although at other wavelengths 
and over a smaller field, making the strength of any bias 
hard to assess). 

5.3. Degeneracies 



We find that the errors on the number count parame- 
ters are highly correlated, and the likelihood around the 
maximum has a very non-Gaussian behavior for some 
parameters. The correlation is negative for two adjacent 
nodes in the counts. This is expected, since sources at a 
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TABLE 4 

Pearson Correlation matrix for the parametrized dN/dS 
MODEL AT 250 /^m. 



Node 


10-4 


0.02 


0.1 


0.25 


0.5 


1.2 


10 


10-* 


1.00 


-0.85 


0.44 


-0.19 


0.08 


-0.06 


-0.04 


0.02 


-0.89 


1.00 


-0.77 


0.42 


-0.17 


0.10 


0.03 


0.1 


0.55 


-0.80 


1.00 


-0.67 


0.28 


-0.17 


0.03 


0.25 


-0.25 


0.43 


-0.67 


1.00 


-0.60 


0.37 


-0.07 


0.5 


0.04 


-0.10 


0.21 


-0.54 


1.00 


-0.72 


0.16 


1.2 


-0.05 


0.06 


-0.12 


0.35 


-0.73 


1.00 


-0.37 


10 


0.02 


-0.03 


0.07 


-0.16 


0.21 


-0.39 


1.00 



can be zero with a probability which is not entirely neg- 
ligible in some flux bins in which case the parameter 
values, which are the logarithm of amplitudes, can take 
very negative values. We have set quite low thresholds 
for the faintest node to prevent parameters taking val- 
ues which are too low, based on the physical assumption 
that number counts should be described by a relatively 
smooth function. The lower errors for some parameters 
are slightly dependent on these thresholds. This is a 
complication introduced by the choice of parametrizing 
counts with the log of amplitudes, but this parametriza- 
tion seems justified considering the power-law nature of 
counts. 

Parameter correlations can be seen more explicitly in 
Figure [H which shows two-dimensional likelihood con- 
tours for pairs of parameters at 250 /im. One can see 
that none of the contours are well described by ellipses. 

The degeneracies between parameters are such that 
number counts can be estimated only in a very limited 
number of bins (of the order of 6 for BLAST). The reso- 
lution limit of the counts strongly increases with the res- 
olution of the maps, since improved resolution leads to 
a more direct relationship between the probability dis- 
tribution and the number counts. The SPIRE instru- 
ment on Herschel, with twice the resolution of BLAST, 
should allow significantly finer binning of the counts and 
should greatly improve statistics because of dramatically 
extended coverage of the sky. 

5.4. Including constraints from FIRAS 

For the three BLAST wavelengths, the lowest flux 
node can be extremely low without limit and still give 
a good fit. In other words the BLAST data arc com- 
patible with a model predicting no sources in the first 
bin. Nevertheless BLAST shows without ambiguity that 
there is a break towards the faint end for all three wave- 
lengths. The number of these fainter sources on the sky 
has a stronger impact on the total intensity of the back- 
ground radiation, which is not directly measurable by 
BLAST, than on the width of its distribution. It ap- 
pears that some of the models which are compatible with 
BLAST histograms produce a GI B which is inconsis tent 
with FIRAS meas urements (see IPuget et al.l 119961 and 
iFixsen et al.lll998( ) at these wavelengths. 

We have used FIRAS constraints pubhshed in 
iFixsen et al.l ()1998[ ) as a prior in the P{D) analysis in 
order to select against models which have an unphysi- 
cally small abundance of faint sources. Even though the 
FIRAS measurements have quite large uncertainties, the 
additional constraint is helpful in breaking this degener- 
acy. BLAST number counts including the FIRAS con- 
straints are shown in Table [3] and in Figure [6l with the 
Pearson correlation matrix given in the lower triangular 
parts of Tables [3Hni Estimation of the number counts is 
improved at the faint end after adding the FIRAS con- 
straint. The break in the counts at the faint end is also 
more clearly detected here than in the case without pri- 
ors. As an example two-dimensional contours including 
FIRAS priors are shown for the first two parameters at 
250 ^m in Figure [5] (top-right panel). 

6. DISCUSSION 
6.1. Comparison with other data 



Note. — Coefficients arc computed for BLAST only 
(upper triangular matrix) and BLAST + FIRAS back- 
ground constraints (lower triangular matrix) following dj = 

JlrPiPil \/^rPi Y.rP'j' where Pi and pj are parameter number i 
and j, and r is the realization number. Node flux units are Jansky. 



TABLE 5 

Pearson Correlation matrix for the parametrized dN/dS 
model at 350 /xm. 



Node [Jy] 


5 X 10-5 


0.015 


0.05 


0.15 


0.5 


5 


5 X 10- '"^ 


1.00 


-0.88 


0.51 


-0.17 


0.11 


-0.04 


0.015 


-0.85 


1.00 


-0.75 


0.33 


-0.20 


0.03 


0.05 


0.43 


-0.78 


1.00 


-0.51 


0.32 


-0.03 


0.15 


-0.16 


0.39 


-0.52 


1.00 


-0.70 


0.10 


0.5 


0.06 


-0.23 


0.35 


-0.71 


1.00 


-0.25 


5 


0.02 


0.02 


-0.05 


0.20 


-0.28 


1.00 



Note. — Coefficients are computed for BLAST only (upper 
triangular matrix) and BLAST + FIRAS background constraints 
(lower triangular matrix). 



TABLE 6 

Pearson Correlation matrix for the parametrized dN/dS 
MODEL at 500 /im. 



Node [Jy] 


0.000025 


0.01 


0.03 


0.08 


0.25 


2.5 


0.000025 


1.00 


-0.82 


0.52 


-0.34 


0.14 


-0.03 


0.01 


-0.71 


1.00 


-0.72 


0.53 


-0.25 


0.02 


0.03 


0.32 


-0.82 


1.00 


-0.69 


0.39 


-0.03 


0.08 


-0.15 


0.61 


-0.70 


1.00 


-0.59 


0.03 


0.25 


0.03 


-0.33 


0.49 


0.63 


1.00 


-0.12 


2.5 


-0.01 


-0.01 


0.02 


0.02 


-0.07 


1.00 



Note. — Coefficients are computed for BLAST only (upper 
triangular matrix) and BLAST + FIRAS background constraints 
(lower triangular matrix). 



given flux contribute to a large range of pixel fluxes in the 
map, due to confusion (as illustrated by Figure [1] show- 
ing how the histogram is modified after removing the 
contribution of sources of a given flux) , and hence shift- 
ing some sources out of one flux bin and into the next 
will lead to two histograms with approximately the same 
shape. This anticorrelation is stronger for lower flux den- 
sity nodes. This is again completely understood - faint 
sources produce a nearly Gaussian histogram, with no 
particular structure allowing us to distinguish between 
the number of sources and their flux. 

The degree of correlation is evaluated using the Pear- 
son correlation matrix, with results given in Tables [4]-[6l 
However, some precaution should be taken in using these 
numbers because of the locally non-Gaussian likelihood 
around its maximum. Some error bars are significantly 
asymmetric, as seen in Figure [5l and more elongated on 
the lower side. This is due to the fact that number counts 
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Fig. 8. — Likelihood contours for pairs of parameters associated 
with adjacent nodes at 250 /xm. The two curves in each panel 
represent 68% and 95% intervals estimated from sampling the 
likelihood with MCMCMH. Parameter units for each graph are 
log[deg~-^ Jy"-"^]. All panels are for P(D) analysis using BLAST 
data only, except for the second panel (top right side) , which shows 
the likelihood contours for the first two parameters (i.e., the faint 
end nodes) after adding FIRAS background constraints. Crosses 
indicate best fit parameter values. P{D) analysis provides only 
upper limits for number counts at the first and last nodes. The 
amplitude of the counts at 0.5 Jy can be zero with a low but not 
completely negligible probability. 

At 250 Ilia BLAST has provided the only existing im- 
ages of the sky, and hence the counts estimates are 
unique. However, it is possible (although challenging) 
to obtain data from the best ground-based facilities 
operating in the 350 fim and 450 fim atmospheric win- 
dows. Several studies have been published using SCUBA; 



ISmail et al.l (pOOl : IBorvs et all (1200. 



iKnudsc n et al 

(20061) or SHARC; |Khaa_eLalJ 

(|2008). These counts estimates are based on a few to 
a handful of sources, and both calibration and reliability 
are issues for interpreting the data. By flying above the 
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Fig. 9. — Best fit cumulative (or integral) BLAST counts, com- 
pared with published estimates using ground-based facilities at 
350 fim and 450 fim. The three curves represent the BLAST cumu- 
lative counts at the three wavelength (250 fim is in green 350 /im in 
blue, and 500 ^m in red) derived by integrating the best fit multi- 
power-law models including FIRAS constraints. Interval of 68% 
confidence (which appear without central dot in Figure) are esti- 
mated at each flux node that we have defined in our differential 
count model, by measuring the dispersion of the likelihood sam- 
pling by MCMCMH. Other data at 350 fim and 500 fj,m are shown 
in blue and red, respectively. Best fit models from BLAST are 
not necessarily centered on error bars due to significant departure 
of the likelihood from Gaussian. The lines indicate upper limits 
of cumulative counts at the fainter nodes of 10"** Jy at 250 ^m, 
5 X 10~^ Jy at 350 /^m and 2.5 X 10~^ Jy at 500 /^m. Upper and 
lower limits from other experiments are Icr limits. No waveband 
correction has been applied to the 450 ^m counts for the compar- 
ison with BLAST counts; those measurements should in principle 
lie somewhere between BLAST 350 and 500 ^tm counts 

bulk of the atmosphere, BLAST was able to increase the 
number of detections (or overall statistics on the counts) 
at these wavelengths by about 2 orders of magnitude. 

BLAST data are sufficient to allow us to determine 
differential counts, whereas all published estimates at 
similar wavelengths have been cumulative counts, i.e., 
N{>S). In Figure [H] we compare our best fit estimate 
for the cumulative counts with the published estimates at 
350 and 450 ^m. Note that there is an additional cal- 
ibration uncertainty when comparing with other experi- 
ments (as well as issues of different waveband filter pro- 
files). BLAST cumulative counts shown in Figure [9] are 
computed by integrating the best fit differential counts, 
and errorbars are estimated from the dispersion of count 
amplitudes at each flux node. 

One can see that BLAST provides an accurate mea- 
surement of the counts over a wide range of flux densities. 
This is due to an unprecedented signal-to-noise ratio and 
sky coverage delivered at these wavelengths. 

6.2. Comparison with models 

We have plotted BLAST differential source counts ob- 
tained with a constraint from the total intensity of the 
CIB, which are listed in Table [3l in Figure [TOl The c urves 
in the flgure are the models of TLagache et al.l ()2004f ). ob- 
tained from their web page^°, dated 5 Aug. 2008 ). The 
models in this regime are largely the sum of two compo- 

^^ [http: //www. las .u-psud. f r/lrgalaxies/Model/#SourceCounts | 
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Fig. 10.— The BLAST difTcrcntial source counts in Table \3\ at 
all thr ee wavelengths are compared to the models of lLagache et all 
II2004I ) . The arrows at the right in each panel denote 99.9% upper 
limits from the highest flux density nodes listed in Table |3] The 
dashed lines at the left connect the lowest node plotted to the 99.9% 
confidence upper limit for an additional node at approximately 
100 fi3j, also listed in Table \3\ The overall agreement is quite 
striking. However, the model predicts more sources at 250 and 
350 fira than are observed and at all wavelengths the counts fall 
more steeply with flux than the model predicts. 

ncnts, starburst and quiescent galaxies, with starbursts 
dominating in the middle of the figure and quiescent 
galaxies forming the bulk of the flatter tail at high flux 
densities. The overall agreement is striking given the pre- 
cision of the data, especially noting that this is the first 
data-set available at these wavelengths and the model 
has not been tuned to fit these new data. However, the 
model does overpredict (by a factor ~10) the number of 
sources with flux densities around 0.1-0.2 Jy at 250 and 
350 /im, and at all three wavelengths the measured slope 
is steeper than predicted by about half a unit. 

6.3. Comparison with other methods 

Standard techniques to estimate number counts are 
based on extracting sources by finding peaks in a beam- 
correlated map. The measured flux density of sources 
must be corrected for biases like flux-boosting caused 
by applying a S/N threshold on noisy data with steeply 
falling counts, as well as boosting due to confusion with 
fainter sources. Derived number counts must also be cor- 
rected for incompleteness and false identifications. In any 
highly confused maps as for BLAST, these biases are 
large and strongly dependent on the underlying counts. 
So, although it possible to account for some of the ef- 



fects fe.g. lCoppin et al.ll2005l : lAustermann et al.ll2009bf) . 

in the end one has to carry out extensive simulations, 
which effectively reproduces the forward-modelling of the 
pixel histogram which we describe in this paper. More- 
over, counting objects above some rms level does not use 
all of the information in the map. 

Figure [TT] shows the number counts estimated in 
BLAST maps by counting objects but without applying 
any bias correction, and compares it with counts esti- 
mated via the P{D) analysis. One can see that biases 
are huge and even bigger than the counts themselves at 
flux densities lower than about 0.1-0.2 Jy at all three 
wavelengths. 

One could imagine an iterative approach for which the 
bias factor correction would be derived from the debi- 
ased number counts, but one would end up approaching 
the full P{D) analysis. The method presented in this pa- 
per efficiently provides unbiased estimates of the counts 
over the full range in fiux density (note in Figure [TT] 
the good match of the counts with results from simu- 
lation using the best fit model). It is therefore highly 
recommended for confusion limited observations or for 
multi-tier surveys where some of the layers suffer strong 
confusion. Nevertheless, it may be that direct source es- 
timation methods are a little more efficient for the very 
brightest sources, for which the probability of overlap 
is negligible. That is because this makes complete use 
of information from all pixels near the peaks. In prac- 
tice this mild advantage for bright objects is probably of 
limited use, since for model fitting one needs a single ap- 
proach which spans the full range of source brightnesses. 
On the other hand source extraction and recipes to cor- 
rect for biases are still needed for constructing catalogues 
and matching to other wavebands. The point is that it is 
the pixel information and not the source catalogue which 
should be used to estimate the counts. 

6.4. Clustering of sources 

In the analysis presented in this paper, we have as- 
sumed that galaxies are randomly and independently dis- 
tributed over the sky. Nevertheless, we know that all 
galaxies are clustered, and in fact significant correlations 
have been found i n the background of the BLAST maps 
(jViero et al.ll2009l ). which are probably due to clustering 
on scales larger than the BLAST beam. 

The way that the probability distribution is modified 
depends strongly on the angular scale being considered. 
Clustering on scales much larger than the beam will make 
the P{D) distribution wider overall. On the other hand, 
clustering around the beamsize (and smaller) will distort 
the distribution in a way which depends much more on 
details of the clustering model. 

The effects can be computed for a given number counts 
model, provided that all the n- point sta t istics of the 
source distribution are known. iBarconsI (|1992D com- 
puted this for s pecifi c toy models of clustering, while 
iToffolatti et al.l (jl998D focussed on m odelling of the 3- 
dimensional clustering of the sources. iTakeuchi fclshiil 
(2004) describe how to estimate the effects on the mo- 
ments of the P{D) distribution by performing appropri- 
ate integrals over the angular 2-point correlation func- 
tion w{9), as well as the higher n-point correlations. 
They show that for realistic source clustering the ef- 
fects of w{9) dominate over those of higher-order cor- 
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Fig. 11. — Comparison of Euclidean normalized differential counts with source catalogues at the three BLAST wavelengths. The dashed 
curves are best fit differential counts from the P{D) analysis including FIRAS constraints. The solid curves are differential counts estimated 
by extracting 4-(t sources from the BLAST maps and without applying any bias correction. The shaded region gives 95% intervals for 
simulated A-a catalogue counts distribution using multiple realizations of the BLAST maps. These shaded regions show what we should 
measure in the 4-(t catalogues if the dashed line is the correct underlying counts model. The negative bias at low flux densities is due to 
incompleteness and the 'spike' at around 0.15 Jy at 250 /im is due to spurious detections and Eddington bias in the BGS-Wide region. The 
excellent agreement between the catalogues estimated in BLAST maps (points and solid lines) and simulations (shaded regions) provides 
a satisfactory cross-check of our P{D) analysis approach. Using this method we recover counts which are consistent with the dashed lines. 



relations, and lead to ~ 10% changes in the width of 
the P{D) histograms, together with a somewhat more 
extended tail. The reason that the impact of cluster- 
ing is not larger is because most of the effect comes from 
sources which are considerably fainter than the confusion 
limit. There are two consequences of this: these faint 
sources have a relatively shallow number counts slope; 
and the faintest sources are less strongly clustered than 
the brighter ones . Sim ple scaling of the calculations of 
iTakeuchi fc Ishiil (|2004[ ) suggest that the effects will be 
no larger for BLAST than for the other surveys which 
they simulate. 

We have carried out a determination of the clustering 
of BLAST sources in Viero et al. (2009), and fit this to 
models. We show in Appendix A how to take a model 
for P(fc), derived from w{0), and estimate the effect of 
clustering on the P{D) distribution. In particular the 
effect on the width of the distribution is relatively easy 
to estimate, under a set of fairly reasonable assumptions. 
We find that the width of the distribution is increased by 
13%, 14% and 20% at 250, 350 and 500 /xm, respectively 
After re-convolving the map with the beam kernel, the 
effect of clustering on the width of the distribution be- 
comes more important since the clusturing signal is large 
scale, and hense the Poisson distribution get more re- 
duced by the convolution. In that case, we find that the 
width is increased by 28%, 25% and 30%. These values 
are low enough compared with the uncertainties that we 
are justified in neglecting the effects of clustering on the 
counts; in particular, it seems that the effect is at most 
of the order of one sigma for the fainter fiux density bins. 
However, for more precise estimates coming from future 
data-sets, it may be that such clustering effects will need 
to be fully considered. 

7. CONCLUSION 

We provide measurements of differential number 
counts at 250, 350, and 500 /im from the statistical analy- 
sis of BLAST maps using a maximum likelihood method 
based on 1-point statistics. We show that in the SNR 
regime of BLAST and future surveys, this method is bet- 
ter suited than counting individual sources, even when 
they are relatively bright. This is because it naturally 
allows for the correction of strong biases due to confusion 



and fiux boosting. This technique also has the advantage 
of providing an unbiased estimate of the counts at flux 
densities well below the limit at which sources can be 
detected individually. The method has been optimized 
to deal with inhomogeneous noise across the map and 
filtering to suppress large scale noise. 

Wc measure the counts at a few fiux nodes connected 
by power laws, covering a wide range of fiuxes, and per- 
form careful analysis of the resulting uncertainties using 
a Markov Chain Monte Carlo approach. We observe a 
very steep slope for the counts at intermediate flux densi- 
ties (approximately in the interval 0.02-0.5 Jy) of about 
—3.7 at 250 /im and —4.5 at 350 and 500 /im, indicat- 
ing strong galaxy evolution. We also detect a faint end 
break at all three wavelengths at about 0.015 Jy. Ad- 
ditionally we observe a change toward a shallower slope 
at the bright end of the counts, particularly at 250 /im, 
consistent with an approach to the Euclidean regime. 

The estimates and uncertainties we provide can be used 
for fitting to specific models. In addition the formalism 
presented in this paper can also be applied directly to 
predict the histogra m from paramet r ized p hysical models 
like those derived in Lagache et al. (" 20031 ). Comparison 
to the models of .Lagache ct al. (2003) show a striking 
agreement given that the model has not been tuned to 
fit these data, however we find fewer sources than ex- 
pected by the model at 250 and 350 /j-m and the slope of 
counts is steeper than expected at all three wavelengths. 
Perhaps the contribution of quiescent sources have been 
overestimated by the models. 

We show that our method provides near optimal results 
when applied to a map filtered with the beam kernel. 
Nevertheless, in the current approach each map is treated 
independently, and consequently some of the informa- 
tion contained in the combined data is not used. Since 
the same sources are statistically detected at the three 
BLAST wavelengths, but with amplitude ratios which 
depend on the intrinsic spectra and redshifts, then one 
could use this additional information to constrain more 
comprehensive models for the underlying sources. The 
natural extension of our method would be the general- 
ization to three-dimensional histogram fitting of the form 
P{Di, D2, D3). The additional cross-band 1-point infor- 
mation brings additional constraints on a combination 
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of redshift evolution of the sources and spectral energy 
distribution shapes. Of course one could also imagine 
an even more general fitting procedure whi ch also uses 
the 2- point statistics of the sort described in lViero et al.l 
([2009), including the cross-band clustering signals. 

This paper provides the details required for developing 
the powerful P{D) technique to estimate counts from 
the much more extensive data that will come from the 
SPIRE and PACS instruments on the Herschel satellite, 
as well as point sources in the Planck data. 
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APPENDIX 
ESTIMATE OF CLUSTERING EFFECT 



The expression for th e variance of the pixel value distribution in the presence of clustering is derived in Eq. 53 of 
iTakeuchi fc Tshiil (pOOi ). and can be written as: 



a = CTp + cr^ . 



Here dp is the variance of pure Poisson fluctuations, and a\ the excess variance due to clustering: 



-1 = 



S'^j[vfn{S)dSd^ 



f f f f SiS2firi)f{v2)n{Si)n{S2)w2{ri,r2)dS,dS2d\id\2- 



(Al) 



(A2) 



(A3) 



W2(v-i,r2) is the 2-dimensional 2-point correlation function for positions ri and r2. Usually, the variance is evaluated 
up to a cutoff Xc in the observed signal given by a; = S/f{r). Thus the integrals must be expressed in terms of xi and 
X2, and as a consequence the evaluat ion o f th e va riance requires the computation of a 6-D integral. 

In our case, we can compute Eqs IA2I and IA3I up to a cutoff in source signal S that we choose to be relatively 
high (e.g. 5 times the noise rms). This is a very good approximation for the computation of the map variance after 
extracting very bright sources, since those are easily identified, and number counts are steep enough that such bright 
sources do not contribute widely to the variance. Since the sky is isotropic and homogeneous, the 2-point correlation 
function depends only on the angular separation 9 = |ri — r2|, and then the variances can be expressed in Fourier 
space: 

(A4) 



S n{S)dSj J P(|k|)/(k)^rf-'k. 

Here -P(|k|) is the 2-dimensional power spectrum of clustering, and is equivalent to the Fourier transform of W2{0), 
while /(k) is the Fourier transform of the beam function /(r). The excess rms of the P{D) distribution due to 
clustering (as given in § 16. 4[) can then be computed through the following expression: 



(A5) 
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